This tutorial describes how to perform the dowstream analysis on single-cell nano CUT&Tag data.
This tutorial assumes that the data have already been preprocessed, including demultiplexing of the different modalities, running cellranger and custom cell calling (see Set Up and Preprocessing).
The downstream analysis is composed by 6 steps:
1. Data loading (peaks, metadata and
fragments)
2. Quality controls (QC)
3. Merge Seurat objects
4. Normalisation and dimensional
reduction
5. Non-linear dimension reduction and
clustering
6. Annotation
~/NatProt/nanoscope/
folderfragments.tsv.gz, fragments.tsv.gz.tbi,
metadata.csv and peaks.broadPeak files and
store them in specific folders (see below section “1.1 Setting some
initial variables”)In this section we are going to load the input files for each modality and sample, create a common set of peaks for each sample, create a fragment object for each experiment, generate a peaks x cell matrix for each experiment and finally generate one Seurat object for experiment.
fragments.tsv.gz,
metadata.csv and peaks.broadPeak files.First, load all the required libraries.
library(Signac)
library(Seurat)
library(GenomicRanges)
library(future)
library(stringr)
library(ggplot2)
library(gghalves)
library(ggpubr)
library(EnsDb.Mmusculus.v79)
First, define some variables that will be used across the entire
vignette. Modify them according to your data.
Extremely important:
repodir the directory where the nanoscope
Github repo is. This is required in order to upload all the custom
functions needed to run this vignette# directory where the nanoscope repo was cloned
repodir <- "~/Documents/GitHub/nanoscope/" # change accordingly
source(paste0(repodir,"scripts/functions_scCT.R"))
wd the output root directory.sample_P23209
and sample_P24004 in a directory called result
and set it as working directory
wd <- "~/NatProt/nanoscope/results/"wd will be
/path/to/nanoscope_toy_data/results/wd is formatted as follow:fragments.tsv.gz is expected to be in a folder
with this path:
wd/{sample_name}/{modality_barcode}/cellranger/outs/metadata.csv is expected to be in a folder with
this path:
wd/{sample_name}/{modality_barcode}/cell_picking/{modality}_peaks.broadPeak is expected to be in a
folder with this path:
wd/{sample_name}/{modality_barcode}/peaks/macs_broad/Please note that nothing but the directories containing the
input data for each samples must be stored in
wd
# directory where the nanoscope output files are stored
wd <- "~/NatProt/nanoscope/results" # change accordingly
setwd(wd)
# Set the used genome version
genome <- "mm10"
# Genome annotations
genome_ann <- EnsDb.Mmusculus.v79
# Assay (peaks or bins)
assay <- "peaks"
# Samples
samples <- dir()
# Modalities
modalities <- dir(samples[1]) # this assumes sample1, sample2, sampleN share the same modalities and barcodes!
# Minimum number of cells a feature must be detected in to be included in the Seurat obj
min_cell <- 1
# Minium number of features a cell must have to be included in the Seurat obj
min_feat <- 1
Here we are going to load the peaks data for each modality, for each
sample.
For convenience, files will be stored in a list.
# Read-in peak coordinates for each modality iterating across samples and modalities
input.ls <- list()
for (smpl in samples) {
cat("Loading peaks for sample",smpl,"\n")
for (mod in modalities) {
cat("\t",mod,"\n")
# get mod name withouth barcode
mod2 <- strsplit(mod,"_")[[1]][1]
# Load peak files
input.ls[[paste0(mod,"_",smpl)]] <- read.table(paste0(smpl,"/",mod,"/peaks/macs_broad/",mod2,"_peaks.broadPeak"))[1:3]
colnames(input.ls[[paste0(mod,"_",smpl)]] ) <- c("chr", "start", "end")
}
}
## Loading peaks for sample sample_P23209
## ATAC_TATAGCCT
## H3K27ac_ATAGAGGC
## H3K27me3_CCTATCCT
## Loading peaks for sample sample_P24004
## ATAC_TATAGCCT
## H3K27ac_ATAGAGGC
## H3K27me3_CCTATCCT
Since the peaks were identified independently in each experiment, it is unlikely they will perfectly overlap. We thus need to merge the peaks from the different samples in order to create a common peak set, and quantify this peak set in each experiment prior to merging the objects.
This will be done by following the Signac vignette.
Having uploaded the peaks for all the modalities and samples, we will convert them into genomic ranges and create a common set of peaks in each sample by using GenomicRanges::reduce function.
First, convert peaks to genomic ranges
input.ls <- lapply(input.ls,function(x){ makeGRangesFromDataFrame(x) })
(We are going to frequently use the lapply function.
lapply applies a given R function to all the items (x) of a
given list)
Then, create a common set of peaks among the different samples. This way it is safe to merge objects from different samples into a single Seurat object (see below Merge Seurat objects)
combined.peaks.ls <- list()
for (mod in modalities) {
combined.peaks.ls[[mod]] <- reduce(x = c(input.ls[[paste0(mod,"_",samples[1])]],
input.ls[[paste0(mod,"_",samples[2])]]))
}
The combined.peaks list contains the common set of peaks
shared among the analysed modalities and samples:
combined.peaks.ls
## $ATAC_TATAGCCT
## GRanges object with 76108 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 3121329-3121554 *
## [2] chr1 3210615-3213108 *
## [3] chr1 3292774-3293707 *
## [4] chr1 3325412-3325622 *
## [5] chr1 3368173-3368684 *
## ... ... ... ...
## [76104] JH584304.1 491-17199 *
## [76105] JH584304.1 26439-28494 *
## [76106] JH584304.1 58819-60831 *
## [76107] JH584304.1 68562-114400 *
## [76108] GL456211.1 235128-235403 *
## -------
## seqinfo: 47 sequences from an unspecified genome; no seqlengths
##
## $H3K27ac_ATAGAGGC
## GRanges object with 70327 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 3042535-3043231 *
## [2] chr1 3325312-3325622 *
## [3] chr1 3368224-3368416 *
## [4] chr1 3412976-3413298 *
## [5] chr1 3514757-3515057 *
## ... ... ... ...
## [70323] JH584304.1 491-44360 *
## [70324] JH584304.1 48370-61344 *
## [70325] JH584304.1 68621-114400 *
## [70326] GL456393.1 40014-41631 *
## [70327] JH584292.1 13043-14058 *
## -------
## seqinfo: 37 sequences from an unspecified genome; no seqlengths
##
## $H3K27me3_CCTATCCT
## GRanges object with 57932 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 3361152-3362524 *
## [2] chr1 3667086-3678017 *
## [3] chr1 4412597-4416942 *
## [4] chr1 4425744-4431988 *
## [5] chr1 4487340-4509495 *
## ... ... ... ...
## [57928] JH584294.1 154717-155488 *
## [57929] JH584304.1 707-17155 *
## [57930] JH584304.1 23562-31962 *
## [57931] JH584304.1 59012-61030 *
## [57932] JH584304.1 68573-114394 *
## -------
## seqinfo: 35 sequences from an unspecified genome; no seqlengths
To quantify our combined set of peaks in each modality and sample, we
need to create a fragment object for each experiment (i.e., an
object holding all the information related to a single fragment file).
To this end we will use the Signac function
CreateFragmentObject which requires, for each experiment,
the fragments.tsv.gz and the metadata.csv
files.
Let’s start by loading the metadata data!
##### Metadata
# Load metadata for each experiment iterating across the sample list
metadata.ls <- list()
for (smpl in samples) {
cat("Loading peaks for sample",smpl,"\n")
for (mod in modalities) {
cat("\t",mod,"\n")
# Load metadata
metadata.ls[[paste0(mod,"_",smpl)]] <- read.csv(paste0(smpl,"/",mod,"/cell_picking/metadata.csv"),
stringsAsFactors = FALSE)
rownames(metadata.ls[[paste0(mod,"_",smpl)]]) <- metadata.ls[[paste0(mod,"_",smpl)]]$barcode
}
}
## Loading peaks for sample sample_P23209
## ATAC_TATAGCCT
## H3K27ac_ATAGAGGC
## H3K27me3_CCTATCCT
## Loading peaks for sample sample_P24004
## ATAC_TATAGCCT
## H3K27ac_ATAGAGGC
## H3K27me3_CCTATCCT
Then, we select only the cells that have passed internal filtering
# Filter the metadata, selecting only cells that have passed internal filtering ("passed_MB")
metadata.ls <- lapply(metadata.ls, function(x){x[x$passedMB,]})
Having loaded the metadta files, we can now create the fragment
objects for each experiment.
As explained in the Signac
vignette, the CreateFragmentObject function checks that
the file is present on disk, it is compressed and indexed, computes the
MD5 sum for the file and the tabix index so that we can tell if the file
is modified at any point, and checks that the expected cells
(metadata.csv) are present in the file.
##### Fragments
# Create one fragment object per each experiment iterating across the sample list
fragment.ls <- list()
for (smpl in samples) {
cat("Loading peaks for sample",smpl,"\n")
for (mod in modalities) {
cat("\t",mod,"\n")
# Load fragments
fragment.ls[[paste0(mod,"_",smpl)]] <- CreateFragmentObject(
path = paste0(smpl,"/",mod,"/cellranger/outs/fragments.tsv.gz"),
cells = metadata.ls[[paste0(mod,"_",smpl)]]$barcode)
}
}
## Loading peaks for sample sample_P23209
## ATAC_TATAGCCT
## H3K27ac_ATAGAGGC
## H3K27me3_CCTATCCT
## Loading peaks for sample sample_P24004
## ATAC_TATAGCCT
## H3K27ac_ATAGAGGC
## H3K27me3_CCTATCCT
head(fragment.ls[[paste0("ATAC_TATAGCCT_",samples[1])]])
| chrom | start | end | barcode | readCount |
|---|---|---|---|---|
| chr1 | 3003932 | 3004576 | GCCTAGGCAGTTACAC-1 | 1 |
| chr1 | 3003985 | 3004021 | CTACTTAAGTTCAGGG-1 | 7 |
| chr1 | 3005376 | 3005689 | CCAGATACAGCGTCGT-1 | 3 |
| chr1 | 3005764 | 3005913 | CAAGCTAGTTTGCATG-1 | 1 |
| chr1 | 3005861 | 3006012 | TCTCTGGTCCAGAGAG-1 | 1 |
| chr1 | 3006223 | 3006256 | AACAGTCCATCTGCAA-1 | 3 |
Now that each experiment has been associated to a fragment object
containing fragments and metadata, we can create a peaks x cell matrix
for each experiment. This will be done by using the Signac function
FeatureMatrix.
FeatureMatrix basically constructs a peaks x cell matrix by
parsing: i) fragments, ii) peaks and iii) metadata. Please, note how the
features here used to build the peaks x cell matrix are the
common set of peaks for each modalities identified in the previous step
(Creating a common set of peaks).
To speed-up the process, we launch FeatureMatrix with
process_n = 20000. This runs OK on our 64GB MacBook Pro,
but could go out-of-memory on other systems. If this happens, please,
decrease the process_n value (2000 should be fine for every
system).
# As usual, append the object to a list
counts.ls <- list()
# Iterate among the experiment names in order to retrieve fragments and cell barcodes from the same experiment
for (experim in names(fragment.ls)) {
cat("Analysing experiment: ",experim,"\n")
# modality name
modal <- paste0(str_split_fixed(experim,"_",4)[,1],"_",str_split_fixed(experim,"_",4)[,2])
cat("\tPeaks from modality: ",modal,"\n")
# Create FeatureMatrix
counts.ls[[experim]] <- FeatureMatrix(fragments = fragment.ls[[experim]], # fragments
features = combined.peaks.ls[[modal]], # modality common set of peaks
cells = metadata.ls[[experim]]$barcode, # metadata
process_n = 20000) # the higher, the faster, the more memory is used
}
## Analysing experiment: ATAC_TATAGCCT_sample_P23209
## Peaks from modality: ATAC_TATAGCCT
## Analysing experiment: H3K27ac_ATAGAGGC_sample_P23209
## Peaks from modality: H3K27ac_ATAGAGGC
## Analysing experiment: H3K27me3_CCTATCCT_sample_P23209
## Peaks from modality: H3K27me3_CCTATCCT
## Analysing experiment: ATAC_TATAGCCT_sample_P24004
## Peaks from modality: ATAC_TATAGCCT
## Analysing experiment: H3K27ac_ATAGAGGC_sample_P24004
## Peaks from modality: H3K27ac_ATAGAGGC
## Analysing experiment: H3K27me3_CCTATCCT_sample_P24004
## Peaks from modality: H3K27me3_CCTATCCT
counts.ls[[paste0(modal,"_",samples[1])]][1:10,1:25]
## 10 x 25 sparse Matrix of class "dgCMatrix"
##
## chr1-3361152-3362524 . . . . . . . . . . . . . . . . . . . . . . . . .
## chr1-3667086-3678017 2 1 . . . . . . . . . . . . . 2 . 2 . . 2 . . . 3
## chr1-4412597-4416942 . . . . . . . . . . . . . 1 . . . . . . . . . . 1
## chr1-4425744-4431988 . . . . . . . . . . . . . . . . . . . . . . . . .
## chr1-4487340-4509495 8 . 2 . 5 10 1 . . . . . . 1 . 1 . . . . 4 13 2 2 7
## chr1-4514289-4515435 1 . . . . . . . . . . . . 1 . . . . . . . . . . 2
## chr1-4568747-4578581 1 . . . 1 . . . . . 1 . . . . . . . . . . 1 . . .
## chr1-4592282-4595271 . . . . . 1 . . . . . . . . . . . . . . . . . . .
## chr1-4687932-4689799 . . . . . . . . . . . . . . . . . . . . . . . . .
## chr1-4785566-4794296 . . . . . . . . . . . . . . . . . . . . . . . . .
Now that we have generated a peaks x cell matrix for each experiment, we can store it in a Seurat object along with the fragment object and the metadata.
This is done for each experiment separately, objects will be merged in the next step. The reason for this is that we would like to first perform QC on each experiment individually and then merge all the experiments in a single Seurat object.
obj.ls <- list()
# Iterate among the experiment names in order to retrieve fragments and cell barcodes from the same experiment
for (experim in names(counts.ls)) {
# Get sample and modality name from the experiment variable
smpl <- str_split_fixed(experim,"_",3)[,3]
modality <- str_split_fixed(experim,"_",2)[,1]
# Create the chromatin assay
chrom.assay <- CreateChromatinAssay(counts = counts.ls[[experim]],
fragments = fragment.ls[[experim]],
genome = genome,
min.cells = min_cell,
min.features = min_feat)
# Create the object
obj.ls[[experim]] <- CreateSeuratObject(counts = chrom.assay,
assay = assay,
meta.data=metadata.ls[[experim]],
project = smpl)
# Add some metadata of origin
obj.ls[[experim]]$dataset <- experim
obj.ls[[experim]]$modality <- modality
obj.ls[[experim]]$sample <- smpl
}
Before merging the objects, we are going to do some QC on the individual experiments. In particular, we are going to drop cells with abnormal number of UMIs and too low percentage of fragments overlapping peaks. To define what is “abnormal” and what is not, we will use quantiles. Everything greater than the 95th or less than the 5th quantiles will be here considered as “abnormal” and therefore discarded.
Please note how we are not here applying any filtering on the TSS enrichment score as while this might be relevant for ATAC it is probably not for H3K27ac and H3K27me3. In case, however, you would like to apply this filtering we recommend to follow Signac’s tutorial on this.
obj.ls.qc <- list()
quant_high <- 0.95
quant_low <- 0.05
for (experiment in names(obj.ls)) {
# Calculate 5th and 95th quantiles for logUMI
logUMI_cutoff_high <- quantile(obj.ls[[experiment]]$logUMI, quant_high)
logUMI_cutoff_low <- quantile(obj.ls[[experiment]]$logUMI, quant_low)
# Calculate 5th quantile for logUMI for % fragments in peaks (here indicated by peak_ratio_MB)
peak_ratio_MB_cutoff_low <- quantile(obj.ls[[experiment]]$peak_ratio_MB, quant_low)
# Filter out cells outside these thresholds
obj.ls.qc[[experiment]] <- subset(obj.ls[[experiment]],
logUMI > logUMI_cutoff_low &
logUMI < logUMI_cutoff_high &
peak_ratio_MB > peak_ratio_MB_cutoff_low)
# print number of discarded cells
old_n_cell <- nrow(obj.ls[[experiment]][[]])
new_n_cell <- nrow(obj.ls.qc[[experiment]][[]])
discarded <- old_n_cell - new_n_cell
cat(experiment,"\n")
cat("\tdiscarded",discarded,"cells (",round(discarded/old_n_cell*100,2),"%)\n")
}
## ATAC_TATAGCCT_sample_P23209
## discarded 366 cells ( 13.58 %)
## H3K27ac_ATAGAGGC_sample_P23209
## discarded 379 cells ( 14.25 %)
## H3K27me3_CCTATCCT_sample_P23209
## discarded 350 cells ( 12.99 %)
## ATAC_TATAGCCT_sample_P24004
## discarded 382 cells ( 13.57 %)
## H3K27ac_ATAGAGGC_sample_P24004
## discarded 399 cells ( 14.1 %)
## H3K27me3_CCTATCCT_sample_P24004
## discarded 382 cells ( 13.52 %)
plotUMIcounts(obj = obj.ls, quantiles = c(quant_low,quant_high), feature = "logUMI")
plotUMIcounts(obj = obj.ls, quantiles = c(quant_low), feature = "peak_ratio_MB")
rm(obj.ls)
Now that we have one object for experiment containing only QC-passed cells, and that all the experiments share a common set of peaks, we can merge all the objects from different samples obtaining one Seurat object for modality.
combined.obj.ls <- list()
for (mod in modalities) {
# merge objects among the different samples
combined.obj.ls[[mod]] <- merge(obj.ls.qc[[paste0(mod,"_",samples[1])]],
y = obj.ls.qc[[paste0(mod,"_",samples[2])]],
add.cell.ids = c(samples[1], samples[2]) )
}
combined.obj.ls[["ATAC_TATAGCCT"]][["peaks"]]
## ChromatinAssay data with 75198 features for 4761 cells
## Variable features: 0
## Genome: mm10
## Annotation present: FALSE
## Motifs present: FALSE
## Fragment files: 2
combined.obj.ls[["H3K27ac_ATAGAGGC"]][["peaks"]]
## ChromatinAssay data with 69613 features for 4711 cells
## Variable features: 0
## Genome: mm10
## Annotation present: FALSE
## Motifs present: FALSE
## Fragment files: 2
combined.obj.ls[["H3K27me3_CCTATCCT"]][["peaks"]]
## ChromatinAssay data with 57176 features for 4788 cells
## Variable features: 0
## Genome: mm10
## Annotation present: FALSE
## Motifs present: FALSE
## Fragment files: 2
After having filter out low quality cells and having generated one Seurat object per each modality, we can proceed with normalisation and dimensionality reduction. Once again, we will follow Signac vignette.
RunTFIDF will be used. This is a two-step normalisation.
procedure, that both normalises across cells to correct for differences
in cellular sequencing depth, and across peaks to give higher values to
more rare peaks# Normalisation
combined.obj.ls <- lapply(combined.obj.ls,RunTFIDF)
# a warning (Some features contain 0 total counts) might rise. This is due to the cell filtering performed in the QC step. Should be safe to proceed
# Variable features
combined.obj.ls <- lapply(combined.obj.ls,FindTopFeatures)
# Dim reduction
combined.obj.ls <- lapply(combined.obj.ls,RunSVD)
It usually happens that the first LSI (you can think it as a
principal component, if you are more familiar with scrna-seq) is mostly
associated to the sequencing depth, rather than biological variation. We
will test this by running a modified version of the
DepthCor function. The modifications simply consists in
plotting the modality name as plot title.
plot.list <- lapply(combined.obj.ls,DepthCorMulMod)
ggarrange(plot.list[[1]],plot.list[[2]],plot.list[[3]],ncol=3)
Here, the correlation between the first LSI and sequencing depth is
quite strong. We will discard it from further analyses.
Now that we have normalised and found variable features in the data, we can perform graph-based clustering and non-linear dimension reduction for visualization. This is done individually for each modality. Although here we are using default parameters for all the modalites, sometimes it might be needed to adjust the parameters for each modality (e.g., number of PCs, clustering resolution, ..).
When working on a different dataset, you might need to adjust the resolution parameter. If so, we suggest using clustertreel.
# Run dimension reduction and clustering on each modality individually
# ATAC
combined.obj.ls$ATAC_TATAGCCT <- RunUMAP(combined.obj.ls$ATAC_TATAGCCT,reduction = 'lsi', dims = 2:40)
combined.obj.ls$ATAC_TATAGCCT <- FindNeighbors(combined.obj.ls$ATAC_TATAGCCT,reduction = 'lsi', dims = 2:40)
combined.obj.ls$ATAC_TATAGCCT <- FindClusters(combined.obj.ls$ATAC_TATAGCCT, verbose = FALSE, algorithm = 3)
# default resolution (.8) is here used
# algorithm = 3 means SLM algorithm is used for modularity optimization
# H3K27ac
combined.obj.ls$H3K27ac_ATAGAGGC <- RunUMAP(combined.obj.ls$H3K27ac_ATAGAGGC,reduction = 'lsi', dims = 2:40)
combined.obj.ls$H3K27ac_ATAGAGGC <- FindNeighbors(combined.obj.ls$H3K27ac_ATAGAGGC,reduction = 'lsi', dims = 2:40)
combined.obj.ls$H3K27ac_ATAGAGGC <- FindClusters(combined.obj.ls$H3K27ac_ATAGAGGC, verbose = FALSE, algorithm = 3)
# H3K27me3
combined.obj.ls$H3K27me3_CCTATCCT <- RunUMAP(combined.obj.ls$H3K27me3_CCTATCCT,reduction = 'lsi', dims = 2:40)
combined.obj.ls$H3K27me3_CCTATCCT <- FindNeighbors(combined.obj.ls$H3K27me3_CCTATCCT,reduction = 'lsi', dims = 2:40)
combined.obj.ls$H3K27me3_CCTATCCT <- FindClusters(combined.obj.ls$H3K27me3_CCTATCCT, verbose = FALSE, algorithm = 3)
# Plot each modality separately
p1=DimPlot(combined.obj.ls$ATAC_TATAGCCT, label = TRUE) + NoLegend() +
ggtitle("ATAC") +
theme(plot.title = element_text(size=15,hjust=0.5,face='bold'))
p2=DimPlot(combined.obj.ls$H3K27ac_ATAGAGGC, label = TRUE) + NoLegend() +
ggtitle("H3K27ac") +
theme(plot.title = element_text(size=15,hjust=0.5,face='bold'))
p3=DimPlot(combined.obj.ls$H3K27me3_CCTATCCT, label = TRUE) + NoLegend() +
ggtitle("H3K27me3") +
theme(plot.title = element_text(size=15,hjust=0.5,face='bold'))
ggarrange(p1,p2,p3,ncol=3)
Now that we have identified the different cell states, we need to
annotate them, assigning to each of them a biological identity. To this
end, we will estimate the activity of each gene by quantifying the
number of fragments mapping to its gene body and 2kb upstream the TSS.
This will be done by using the Signac function
GeneActivity.
Of course, this has to be done on modalities marking
transcriptionally active regions. In our case either ATAC or H3K27ac. We
believe that H3K27ac is the active modality which might best predict
gene activity, thus we will calculate the gene activity scores on nano
CUT&Tag H3K27ac modality.
Once again, for most of these processes, we will here follow the Signac
vignette.
Before creating a gene activity matrix we first need to add gene annotations to the H3K27ac Seurat object.
# get gene annotations from Ensembl
annotations <- GetGRangesFromEnsDb(ensdb = genome_ann)
# change to UCSC style since the data was mapped to mm10
seqlevelsStyle(annotations) <- 'UCSC'
# add the gene information to the object
Annotation(combined.obj.ls$H3K27ac_ATAGAGGC) <- annotations
Having added the gene annotations, we can calculate the gene
activity. This is done by the Signac GeneActivity function
that: i) extracts gene coordinates and extends them to include the 2 kb
upstream region and ii) counts the number of fragments for each cell
that map to each of these regions. Finally, the gene activities are
added to the Seurat object as ‘RNA’
# Calculate gene activities
gene.activities <- GeneActivity(combined.obj.ls$H3K27ac_ATAGAGGC)
# Add the gene activity matrix to the Seurat object as a new assay
combined.obj.ls$H3K27ac_ATAGAGGC[['RNA']] <- CreateAssayObject(counts = gene.activities)
# normalize RNA data
combined.obj.ls$H3K27ac_ATAGAGGC <- NormalizeData(
object = combined.obj.ls$H3K27ac_ATAGAGGC,
assay = 'RNA',
normalization.method = 'LogNormalize',
scale.factor = median(combined.obj.ls$H3K27ac_ATAGAGGC$nCount_RNA)
)
# Set RNA as default assay
DefaultAssay(combined.obj.ls$H3K27ac_ATAGAGGC) <- 'RNA'
By plotting the activities of known canonical marker genes we can get
some hints on the biological identities of most of the H3K27ac clusters.
Although the signal might sometimes be a bit noisy, already by plotting
the gene activity scores we can discriminate some cell states such as
astrocytes (telencephalon and non-telencephalon), vascular cells
(endothelial, smooth muscle, leptomeningeal cells), oligodendrocytes,
neurons (excitatory and inhibitory), microglia, macrophages, Bergmann
glia, olfactory ensheating cells and choroid plexus cells.
(These atlases are a great
resource to manually annotate your favorite cell state!)
For conciseness, we here opted for a manual annotation, but the
probably best practice would be to integrate one of the active
nanoCUT&Tag modalities with scRNA-seq data. If interested in doing
so, please follow this
Signac tutorial.
FeaturePlot(object = combined.obj.ls$H3K27ac_ATAGAGGC,
features = c('Irx2','Agt','Slc6a9'),
pt.size = 0.1,
max.cutoff = 'q95',
ncol = 3,
label = T)
FeaturePlot(
object = combined.obj.ls$H3K27ac_ATAGAGGC,
features = c('Gpr116','Cldn5','Emcn'),
pt.size = 0.1,
max.cutoff = 'q95',
ncol = 3,
label = T)
FeaturePlot(
object = combined.obj.ls$H3K27ac_ATAGAGGC,
features = c('Foxg1','Lhx2','Mfge8'),
pt.size = 0.1,
max.cutoff = 'q95',
ncol = 3,
label = T)
FeaturePlot(
object = combined.obj.ls$H3K27ac_ATAGAGGC,
features = c('Mag','Mbp'),
pt.size = 0.1,
max.cutoff = 'q95',
ncol = 2,
label = T)
FeaturePlot(
object = combined.obj.ls$H3K27ac_ATAGAGGC,
features = c('Neurod1','Neurod6'),
pt.size = 0.1,
max.cutoff = 'q95',
ncol = 2,
label = T)
FeaturePlot(
object = combined.obj.ls$H3K27ac_ATAGAGGC,
features = c('P2ry12','Ccr6','Selplg'),
pt.size = 0.1,
max.cutoff = 'q95',
ncol = 3,
label = T)
FeaturePlot(
object = combined.obj.ls$H3K27ac_ATAGAGGC,
features = c('Gad1','Slc32a1'),
pt.size = 0.1,
max.cutoff = 'q95',
ncol = 2,
label = T)
FeaturePlot(
object = combined.obj.ls$H3K27ac_ATAGAGGC,
features = c('A2m','Gdf10','Hopx'),
pt.size = 0.1,
max.cutoff = 'q95',
ncol = 3,
label = T)
FeaturePlot(
object = combined.obj.ls$H3K27ac_ATAGAGGC,
features = c('Map3k7cl','Myh11','Acta2'),
pt.size = 0.1,
max.cutoff = 'q95',
ncol = 3,
label = T)
FeaturePlot(
object = combined.obj.ls$H3K27ac_ATAGAGGC,
features = c('Frzb','Mgst1','Prss56'),
pt.size = 0.1,
max.cutoff = 'q95',
ncol = 3,
label = T)
FeaturePlot(
object = combined.obj.ls$H3K27ac_ATAGAGGC,
features = c('Dcn','Slc6a13'),
pt.size = 0.1,
max.cutoff = 'q95',
ncol = 2,
label = T)
FeaturePlot(
object = combined.obj.ls$H3K27ac_ATAGAGGC,
features = c('F13a1','Cd74'),
pt.size = 0.1,
max.cutoff = 'q95',
ncol = 2,
label = T)
FeaturePlot(
object = combined.obj.ls$H3K27ac_ATAGAGGC,
features = c('Clic6','Folr1'),
pt.size = 0.1,
max.cutoff = 'q95',
ncol = 2,
label = T)
Now we are ready to annotate the clusters! We are going to generate
here two layers of annotation, one more general and one more
specific.
Layer 2 (cell-state-specific)
##### Annotation layer2
# create a named annotation vector
l2.ids <- c("0"="AST-NT","1"="VEC","2"="AST-TE1","3"="OL","4"="EXC1",
"5"="MGL","6"="INH1","7"="AST-TE2","8"="BG","9"="VSMC",
"10"="OEC","11"="VLMC","12"="MAC","13"="CHP","14"="INH2",
"15"="EXC2")
names(l2.ids) <- levels(combined.obj.ls$H3K27ac_ATAGAGGC)
# rename the cluster identities with the layer2 annotations
combined.obj.ls$H3K27ac_ATAGAGGC <- RenameIdents(combined.obj.ls$H3K27ac_ATAGAGGC,l2.ids)
# Add layer2 annotations to the object
combined.obj.ls$H3K27ac_ATAGAGGC <- AddMetaData(combined.obj.ls$H3K27ac_ATAGAGGC,
Idents(combined.obj.ls$H3K27ac_ATAGAGGC),"layer2_annotation")
# UMAP plot
p1=DimPlot(combined.obj.ls$H3K27ac_ATAGAGGC, label = TRUE, repel = T) + NoLegend() +
ggtitle("H3K27ac") +
theme(plot.title = element_text(size=15,hjust=0.5,face='bold'))
Layer 1 (cell-type-specific)
##### General layer, L1
# create a named annotation vector
l1.ids <- c("AST-NT"="Astroependymal","VEC"="Vascular","AST-TE1"="Astroependymal",
"OL"="Oligodendrocytes","EXC1"="Neurons","MGL"="Immune","INH1"="Neurons",
"AST-TE2"="Astroependymal","BG"="Astroependymal","VSMC"="Vascular",
"OEC"="OEC","VLMC"="Vascular","MAC"="Immune","CHP"="Astroependymal",
"INH2"="Neurons","EXC2"="Neurons")
names(l1.ids) <- levels(combined.obj.ls$H3K27ac_ATAGAGGC)
# rename the cluster identities with the layer1 annotations
combined.obj.ls$H3K27ac_ATAGAGGC <- RenameIdents(combined.obj.ls$H3K27ac_ATAGAGGC,l1.ids)
# Add layer1 annotations to the object
combined.obj.ls$H3K27ac_ATAGAGGC <- AddMetaData(combined.obj.ls$H3K27ac_ATAGAGGC,
Idents(combined.obj.ls$H3K27ac_ATAGAGGC),"layer1_annotation")
# UMAP plot
p2=DimPlot(combined.obj.ls$H3K27ac_ATAGAGGC, label = F) +
ggtitle("H3K27ac") +
theme(plot.title = element_text(size=15,hjust=0.5,face='bold'))
Finally, transfer the layer1 and layer2 annotations from the H3K27ac to the H3K27me3 and ATAC modalities.
# H3K27ac annotated cell states
idents_l1 <- combined.obj.ls$H3K27ac_ATAGAGGC$layer1_annotation
idents_l2 <- combined.obj.ls$H3K27ac_ATAGAGGC$layer2_annotation
# Transfer to ATAC and H3K27me3
# layer1
combined.obj.ls$ATAC_TATAGCCT <- AddMetaData(combined.obj.ls$ATAC_TATAGCCT,
idents_l1,
col.name='layer1_annotation')
combined.obj.ls$H3K27me3_CCTATCCT <- AddMetaData(combined.obj.ls$H3K27me3_CCTATCCT,
idents_l1,
col.name='layer1_annotation')
# layer2
combined.obj.ls$ATAC_TATAGCCT <- AddMetaData(combined.obj.ls$ATAC_TATAGCCT,
idents_l2,
col.name='layer2_annotation')
combined.obj.ls$H3K27me3_CCTATCCT <- AddMetaData(combined.obj.ls$H3K27me3_CCTATCCT,
idents_l2,
col.name='layer2_annotation')
plotConnectModal(seurat = combined.obj.ls, group = 'layer1_annotation')
saveRDS(combined.obj.ls, file = "../nanoscope_final.rds")
sessionInfo()
## R version 4.2.2 (2022-10-31)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Monterey 12.5.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] EnsDb.Mmusculus.v79_2.99.0 ensembldb_2.22.0
## [3] AnnotationFilter_1.22.0 GenomicFeatures_1.50.4
## [5] AnnotationDbi_1.60.2 Biobase_2.58.0
## [7] ggpubr_0.6.0 gghalves_0.1.4
## [9] ggplot2_3.4.1 stringr_1.5.0
## [11] future_1.32.0 GenomicRanges_1.50.2
## [13] GenomeInfoDb_1.34.9 IRanges_2.32.0
## [15] S4Vectors_0.36.2 BiocGenerics_0.44.0
## [17] SeuratObject_4.1.3 Seurat_4.3.0
## [19] Signac_1.9.0
##
## loaded via a namespace (and not attached):
## [1] utf8_1.2.3 spatstat.explore_3.1-0
## [3] reticulate_1.28 tidyselect_1.2.0
## [5] RSQLite_2.3.0 htmlwidgets_1.6.2
## [7] grid_4.2.2 BiocParallel_1.32.5
## [9] Rtsne_0.16 munsell_0.5.0
## [11] codetools_0.2-19 ica_1.0-3
## [13] miniUI_0.1.1.1 withr_2.5.0
## [15] spatstat.random_3.1-4 colorspace_2.1-0
## [17] progressr_0.13.0 filelock_1.0.2
## [19] highr_0.10 knitr_1.42
## [21] rstudioapi_0.14 ROCR_1.0-11
## [23] ggsignif_0.6.4 tensor_1.5
## [25] listenv_0.9.0 labeling_0.4.2
## [27] MatrixGenerics_1.10.0 GenomeInfoDbData_1.2.9
## [29] polyclip_1.10-4 farver_2.1.1
## [31] bit64_4.0.5 parallelly_1.34.0
## [33] vctrs_0.6.0 generics_0.1.3
## [35] xfun_0.37 biovizBase_1.46.0
## [37] BiocFileCache_2.6.1 R6_2.5.1
## [39] DelayedArray_0.24.0 bitops_1.0-7
## [41] spatstat.utils_3.0-2 cachem_1.0.7
## [43] promises_1.2.0.1 BiocIO_1.8.0
## [45] scales_1.2.1 nnet_7.3-18
## [47] gtable_0.3.2 globals_0.16.2
## [49] goftest_1.2-3 rlang_1.1.0
## [51] RcppRoll_0.3.0 splines_4.2.2
## [53] rtracklayer_1.58.0 rstatix_0.7.2
## [55] lazyeval_0.2.2 dichromat_2.0-0.1
## [57] checkmate_2.1.0 spatstat.geom_3.1-0
## [59] broom_1.0.4 yaml_2.3.7
## [61] reshape2_1.4.4 abind_1.4-5
## [63] backports_1.4.1 httpuv_1.6.9
## [65] Hmisc_5.0-1 tools_4.2.2
## [67] ellipsis_0.3.2 jquerylib_0.1.4
## [69] RColorBrewer_1.1-3 ggridges_0.5.4
## [71] Rcpp_1.0.10 plyr_1.8.8
## [73] base64enc_0.1-3 progress_1.2.2
## [75] zlibbioc_1.44.0 purrr_1.0.1
## [77] RCurl_1.98-1.10 prettyunits_1.1.1
## [79] rpart_4.1.19 deldir_1.0-6
## [81] pbapply_1.7-0 cowplot_1.1.1
## [83] zoo_1.8-11 SummarizedExperiment_1.28.0
## [85] ggrepel_0.9.3 cluster_2.1.4
## [87] magrittr_2.0.3 data.table_1.14.8
## [89] scattermore_0.8 lmtest_0.9-40
## [91] RANN_2.6.1 ProtGenerics_1.30.0
## [93] fitdistrplus_1.1-8 matrixStats_0.63.0
## [95] hms_1.1.2 patchwork_1.1.2
## [97] mime_0.12 evaluate_0.20
## [99] xtable_1.8-4 XML_3.99-0.14
## [101] gridExtra_2.3 compiler_4.2.2
## [103] biomaRt_2.54.0 tibble_3.2.0
## [105] KernSmooth_2.23-20 crayon_1.5.2
## [107] htmltools_0.5.4 later_1.3.0
## [109] Formula_1.2-5 tidyr_1.3.0
## [111] DBI_1.1.3 dbplyr_2.3.1
## [113] MASS_7.3-58.3 rappdirs_0.3.3
## [115] Matrix_1.5-3 car_3.1-1
## [117] cli_3.6.0 parallel_4.2.2
## [119] igraph_1.4.1 pkgconfig_2.0.3
## [121] GenomicAlignments_1.34.1 foreign_0.8-84
## [123] sp_1.6-0 plotly_4.10.1
## [125] spatstat.sparse_3.0-1 xml2_1.3.3
## [127] bslib_0.4.2 XVector_0.38.0
## [129] VariantAnnotation_1.44.1 digest_0.6.31
## [131] sctransform_0.3.5 RcppAnnoy_0.0.20
## [133] spatstat.data_3.0-1 Biostrings_2.66.0
## [135] rmarkdown_2.20 leiden_0.4.3
## [137] fastmatch_1.1-3 htmlTable_2.4.1
## [139] uwot_0.1.14 restfulr_0.0.15
## [141] curl_5.0.0 shiny_1.7.4
## [143] Rsamtools_2.14.0 rjson_0.2.21
## [145] lifecycle_1.0.3 nlme_3.1-162
## [147] jsonlite_1.8.4 carData_3.0-5
## [149] BSgenome_1.66.3 viridisLite_0.4.1
## [151] fansi_1.0.4 pillar_1.8.1
## [153] lattice_0.20-45 KEGGREST_1.38.0
## [155] fastmap_1.1.1 httr_1.4.5
## [157] survival_3.5-5 glue_1.6.2
## [159] png_0.1-8 bit_4.0.5
## [161] stringi_1.7.12 sass_0.4.5
## [163] blob_1.2.4 memoise_2.0.1
## [165] dplyr_1.1.0 irlba_2.3.5.1
## [167] future.apply_1.10.0